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Abstract 

In most domains of network analysis researchers consider networks that arise in nature 
with weighted edges. Such networks are routinely dichotomized in the interest of using 
available methods for statistical inference with networks. The generalized exponential 
random graph model (GERGM) is a recently proposed method used to simulate and 
model the edges of a weighted graph. The GERGM specifies a joint distribution for an 
exponential family of graphs with continuous-valued edge weights. However, current 
estimation algorithms for the GERGM only allow inference on a restricted family of 
model specifications. To address this issue, we develop a Metropolis-Hastings method 
that can be used to estimate any GERGM specification, thereby significantly extending 
the family of weighted graphs that can be modeled with the GERGM. We show that 
new flexible model specifications are capable of avoiding likelihood degeneracy and 
efficiently capturing network structure in applications where such models were not 
previously available. We demonstrate the utility of this new class of GERGMs through 
application to two real network data sets, and we further assess the effectiveness of 
our proposed methodology by simulating non-degenerate model specifications from the 
well-studied two-stars model. A working R version of the GERGM code is available in 
the supplement and will be incorporated in the gergm CRAN package. 
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1 Introduction 


Throughout the sciences, but particularly in the social sciences, a fundamental tool for the 
statistical analysis of networks has been the exponential random graph model (ERGM) - a 
popular, powerful, and flexible tool for statistical inference with network data (Holland and 
Leinhardt, 1981; Wasserman and Pattison, 1996; Snijders et ah, 2006). Despite their pop¬ 
ularity, conventionally used ERGMs have the major limitation that they require the edges 
of an observed network be binary (representing the presence or absence of an edge). Thus 
ERGMs cannot directly model weighted networks. Since many substantively important net¬ 
works are weighted, this restriction is especially problematic. Weighted networks arise, for 
example, in the study of financial exchange (Iori et al., 2008), migration patterns (Chun, 
2008), and in the analysis of brain functionality and connectivity (Simpson et al., 2011). 
Recently, some progress on modeling weighted networks in the ERGM framework was 
made in Desmarais and Cranmer (2012), where the generalized exponential random graph 
model (GERGM) was proposed to study networks with continuous-valued edges. Around 
the same time, Krivitsky (2012) proposed the weighted exponential random graph model 
that generalized the ERGM to networks with integer-valued edges. Robins et al. (1999) de¬ 
veloped logistic dyad-independent models for networks with integer-valued edges. Though 
each of these models provide a means to analyze weighted networks, we will focus on 
extensions to the GERGM. 

In general, the likelihood function of an ERGM is intractable (though some recent 
progress has been made in the large network n —» oo limit (Chatterjee et al., 2013; Lubetzky 
and Zhao, 2014)); however, efficient estimation can be achieved through the use of Markov 
Chain Monte Carlo (MCMC) algorithms (Geyer and Thompson, 1992; Hunter and Hand- 
cock, 2006) . MCMC can be used to simulate samples of networks from which the likelihood 
function of an ERGM can be approximated. Like the ERGM, estimation of the GERGM is 
readily achieved via MCMC algorithms. Desmarais and Cranmer (2012) proposed a Gibbs 
sampling technique for GERGM estimation; however, this strategy limits the specification of 
network dependencies captured by the GERGM to those for which full conditional edge dis¬ 
tributions can be derived in closed form. Another important obstacle that arises in discrete 
exponential family model specification is the problem of degeneracy, a condition under 
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which only a few network configurations - usually very sparse and very dense networks - 
have high probability mass (Handcock et ah, 2003; Rinaldo et ah, 2009; Schweinberger, 
2011). The issue of degeneracy strongly influences the effectiveness of an MCMC algorithm. 
Indeed, in the case that nearly empty (or nearly complete) networks are most probable, 
estimation via MCMC will fail to converge to consistent parameter estimates. 

Here, we expand the family of weighted networks that can be analyzed under the 
GERGM by developing a Metropolis-Hastings sampling procedure that allows the flexible 
specification of network statistics and models under the GERGM framework. Perhaps the 
greatest drawback of the limited set of models for which Gibbs Sampling can be used to 
simulate networks is that they are prone to degeneracy. This is due to the fact that the 
closed-form derivation of the conditional distribution of an edge requires that the network 
statistics used to specify the GERGM depend linearly on the value of each edge. GERGM 
specifications that include nonlinear statistics are often required to avoid degeneracy. A 
significant advantage of our proposed Metropolis-Hastings (MH) procedure is that one 
can use MH sampling to estimate models that involve nonlinear network statistics. The 
expanded set of GERGM specifications made available with the use of MH can be used 
to find a non-degenerate model specification. Furthermore, in models where the Gibbs 
sampler can be used, Metropolis-Hastings yields the same parameter estimates as those 
obtained via Gibbs. The framework established here provides an important step in flexibly 
modeling and simulating weighted networks while further providing a means of avoiding 
model degeneracy. 

In Section 2, we describe the generalized exponential random graph model for graphs 
with continuous-valued edges. In Section 3, we discuss the Monte Carlo maximum like¬ 
lihood estimation of the GERGM and briefly describe the Gibbs procedure devised in 
Desmarais and Cranmer (2012). At the end of Section 3, we formulate a flexible Metropolis- 
Hastings sampling procedure. We propose a class of model specifications in Section 4 that 
expands the family of GERGMs beyond those permissible under Gibbs sampling. In Section 
5, we evaluate the performance and potential utility of our proposed framework through 
application to the U.S. state-to-state migration network, an international financial exchange 
network, as well as through a simulation study that revisits the degenerate two-star-model 
of Handcock et al. (2003). We conclude with a discussion of open problems and future 
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work in Section 6. 


2 The Generalized Exponential Random Graph Model 

Consider a directed network defined on a node set [n] = {1,2,..., n}, where m = n(n — 
1) denotes the total number of directed edges between these nodes. Suppose that the 
weighted relationships between the nodes are represented by a collection of weights (yij : 
i + j e [n]) e W n . The aim of this section is to describe a specific class of probability 
models on M m as constructed in Desmarais and Cranmer (2012) called GERGMs that 
incorporates relational structure between the nodes to generate a random vector Y e M m . 
This probability distribution is specified by a joint probability density function (pdf) /y(y, 0) 
driven by real-valued parameters 0. 

A GERGM for the observed configuration y has a simple generative process that relies on 
two distinct steps. First, a joint distribution that captures the structure and interdependence 
of Y is defined on a restricted network configuration, X e [0, l] m . Next, the restricted 
network X is transformed onto the support of Y through an appropriate transformation 
function. These two steps are closely related to the widely studied specification of joint 
distributions via copula functions (Genest and MacKay, 1986). We now describe the two 
steps in specifying a GERGM in more detail. 

In the first specification step, a function of network summary statistics h : [0, l] m —> 
is formulated to represent the joint features of X. The random vector X is modeled by an 
exponential family with parameters 6 e W as follows: 


fx{x,Q) = 


exp (6'h{x)) 


x e [0, l] r 


( 1 ) 


$ [0jl]m exp (0'h(z))dz’ 

where 6' denotes the transpose of the vector 6. The network specification in model (1) is 
closely related to the usual specification of exponential random graph models on binary 
edges with the exception that individual edges are now modeled as having continuous 
weights taking values between 0 and 1. As dependence relationships can be captured by 
functions of edges valued on the unit interval, model (1) provides a flexible specification 
of interdependence. For instance, networks generated by a highly reciprocal process are 
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likely to exhibit high values of Yh < 3 x ij x ji> and those for which there is a high variance in 
the popularity of vertices (e.g., preferential attachment) are likely to exhibit high values 
of the “two-stars” statistic k^i x ji x ki (Park and Newman, 2004a). We describe several 
flexible network statistics for modeling interdependence in Section 4. Note that the uni¬ 


form distribution on [0, l] m is a special case of the model above, obtained by setting the 


parameters 0 = 0. 

In the second specification step, a one-to-one and coordinate wise monotonically non¬ 
decreasing function T” 1 : [0, l] m —» R m is formulated to model the transformation of the 
restricted network X onto the support of Y. Specifically, for each pair of distinct nodes i, 
j e [n], we model Y VJ = T^ 1 (X, (3) where (3 e M fc parameterizes the transformation so as to 
capture the marginal features of Y. Since T l is a monotonically non-decreasing, the pdf 
of Y is given by 



m 


( 2 ) 


where t l3 (y, (3) = dT t] {y, j3)/dyij. Though the choice of T -1 is flexible, specifying T~ l so that 
T^ 1 is an inverse cumulative distribution function (cdf) is advisable because the properties 
of (2) are difficult to understand without this restriction and because it leads to several 
beneficial properties. First, when T" 1 is an inverse cdf, is precisely a marginal pdf for 
all i Y j. Second, when 6 = 0, then f Y (y,6,(3) reduces to a product of marginal pdfs 
{tij} and thus in this special case one obtains a model with dyadic independence acorss 
edge weight distributions. An important example includes taking T _1 as the inverse of a 
Gaussian cdf with constant variance. In this special case, if 9 = 0 then (2) reduces to a 
model for conditionally independent Gaussian observations, such as ordinary least squares 
regression. 


3 Model Inference 


The GERGM specification in equations (1) and (2) can be used to readily model a wide 
range of network interdependencies in weighted networks. In this section, we describe 
maximum likelihood inference of the parameters 6 and (3 via MCMC. We review the Gibbs 
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sampling procedure in Desmarais and Cranmer (2012), which relies on an important restric¬ 
tion of model specification. We then develop a general inferential framework for sampling 
via Metropolis-Hastings, which extends the family of GERGM specifications. We provide 
pseudo-code for the MCMC maximum likelihood estimation procedure described in Sec¬ 
tions 3.1 - 3.4 in the Appendix. 

3.1 Maximum Likelihood Inference 

Given a specification of statistics h(-), transformation function T -1 , and observations Y = y 
from the distribution (2), our goal is to find the maximum likelihood estimates (MLEs) of 
the unknown parameters 0 and (3, namely to find values 6 and (3 that maximize the log 
likelihood: 

=^{T{y,P))-logC{0)+Yj\og t ii{y,P), (3) 

ij 

where 

C( 0 ) = f exp(0'h (z))dz. 

J[0,l] m 

The maximization of (3) can be achieved through alternate maximization of (3\9 and 
0 1/3. In particular, one can calculate the MLEs 0 and (3 by iterating between the following 
two steps until convergence. 

For r > 1, iterate until convergence: 

1. Given 0( r \ estimate / 3 h)| 0 b’) : 

f3 {r) = argrnax^ ^ (r) h (T(y,P)) +Y l \ogt ij (y,f3) S j . (4) 


2. Set a: = T(y,(3 ^). Then estimate 0 ( r +O |/ 3 h) : 

0 ( r +i) _ ar g ma x 0 ^0'h(£) — log C(9)^. (5) 

For fixed 9, the likelihood maximization in (4) is straightforward and can be accom¬ 
plished numerically using gradient descent (Snyman, 2005). In the case that tij is log- 
concave and h o T is concave in (3, a hill climbing algorithm is assured to find the global 
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optimum. 

The maximization in (5) is a difficult problem due to the intractability of the normaliza¬ 
tion factor C(9). There has been much recent work on circumventing the intractability of 
C{9). For example, Strauss and Ikeda (1990) consider using the maximim pseudo-likelihood 
estimate (MPLE) for 9, which assumes independence of the edges in the graph. Van Duijn 
et al. (2009) shows, however, that using the MPLE is often biased and far less efficient 
than the maximum likelihood estimate especially when strong network dependencies are 
present. In light of the inefficiency of pseudo-likelihood estimates, we turn to MCMC meth¬ 
ods for estimating (5) which have witnessed considerable success in estimating exponential 
family models (Geyer and Thompson, 1992; Hunter and Handcock, 2006). We describe 
the MCMC framework for estimating 9 and then review the constrained Gibbs procedure 
developed in Desmarais and Cranmer (2012) before introducing our new more flexible 
Metropolis-Hastings procedure. 


3.2 Monte Carlo Maximization in the GERGM 

Let 9 and 9 be two arbitrary vectors in W and let C(-) be defined as in (3). The crux of 
optimizing (5) via Monte Carlo simulation relies on the following property of exponential 
families (Geyer and Thompson, 1992): 


C(9) 

C\9) 


Ei 


exp 


((tf-tf)Tl(A')) 


( 6 ) 


The expectation in (6) is not directly computable; however, a first order approximation to 
this quantity is given by the first moment estimate: 


E;, 


exp ((0 - 0)'h(X 


M 


j=l 


jj 2 exp ((0-0) /h O£ 


0) 


(7) 


where ..., x^ is an observed sample from pdf fx{-, 9). 

Define £{9\x) := 0h(x) — logC(0). Then maximizing £(9\x) with respect to 9 e W is 
equivalent to maximizing £{9\x) — £{9\x) for any fixed arbitrary vector 9 e W. Equations (6) 
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and (7) suggest: 


( 1 1V1 \ 

— ^exp ^(0 — 0)'h(x( J '))'J J . (8) 

An estimate for 6 can now be calculated by the maximization of (8). The r + 1st iteration 
estimate 9^ r+v > in 5 can be obtained using Monte Carlo methods by iterating between the 
following two steps: 

Given (3^ r \ 0^ r \ and x = T(y,(3 

1. Simulate networks x^\ ..., x ^ from density f x (x , 0^). 

2. Update: 


^ M 


9 {r+1) = argrnaxg ( 0'h(x) - log ( — £ exp {{9 - 0 w )'h 


3 = 1 



(9) 


Given observations Y = y, the Monte Carlo algorithm described above requires an 
initial estimate (3^ and 0 ^. We initialize (3 0 using (4) in the case that there are no network 
dependencies present, namely, (3^ = argmax^l^V log (3)}. We then fix x obs = T(y, f3 0 ), 
and use the Robbins-Monro algorithm for exponential random graph models described in 
Snijders (2002) to initialize 0^\ This initialization step can be thought of as the first step 
of a Newton-Raphson update of the MPLE estimate 6 MPLE on a small sample of networks 
generated from the density fx{x 0 b s ,9 MPLE ). 

The first step of the Monte Carlo algorithm requires simulation from the density fx(x, 9 ^). 
As this density cannot be directly computed, one must rely on the use of MCMC methods, 
such as Gibbs or Metropolis-Hastings samplers, for estimation. 


3.3 Simulation via Gibbs Sampling 

The Gibbs sampling procedure described in Desmarais and Cranmer (2012) provides a 
straightforward way to estimate 9 through the iterative optimization of (8); however, its use 
restricts the specification of network statistics h(-) in the GERGM formulation. In particular, 
the use of Gibbs sampling requires that the network dependencies in an observed network 
y are captured through x by first order network statistics, namely statistics h(-) that are 
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linear in x,j for all i,j e [n]. With this assumption, one can derive a closed-form conditional 
distribution of X {J given the remaining network, X_yy, which is used in Gibbs sampling. 

Let f Xij \x_ w) {xij, 0) denote the conditional pdf of X tJ given the remaining restricted 
network X_^y Consider the following condition on h(x): 


d 2 h.(x) 

dxfj 


0, i,je[n] 


( 10 ) 


Assuming that (10) holds, one can readily derive a closed form expression for f x , 


fx i:j \x_ (ij) (xij,6) - 


. d'dhlx) 

ex P 


Qi dhte) 

dxij 


-1 r 


exp(#'3gl) - 1 


( 11 ) 


Let U be uniform on (0,1). Using the conditional density in (11), one can simulate 
values of x e iteratively by drawing edge realizations of A^|X_(y) according to the 
following distribution: 


X tj \X_ 


log 


1 + U 


exp(0'^' 


dxi 


a/OhC 


x 




i dh(x) 
dxii 


dx 


¥= 0 


( 12 ) 


y 


When 0 ld -^~ = 0, all values in [0,1] are equally likely; thus, X l3 \X_yy is simply drawn 
uniformly from support [0,1]. The Gibbs simulation procedure simulates network sam¬ 
ples x^\ ... from f x {x,9) by sequentially sampling each edge from its conditional 
distribution given in (12). 

Assumption (10) greatly restricts the class of models that can be fit under the GERGM 
framework. To appropriately fit structural features of a network such as the degree dis¬ 
tribution, reciprocity, clustering or assortative mixing, it may be necessary to use network 
statistics that involve nonlinear functions of the edges. Under Assumption (10), nonlin¬ 
ear functions of edges are not permitted - a limitation that may prevent theoretically or 
empirically appropriate models of networks in many domains. Furthermore, as we will 
demonstrate in our numerical study, exponentially weighted network statistics like those 
in Table 1 can provide a means to flexibly model networks. This is particularly benefi¬ 
cial in cases where a theoretically appropriate non-degenerate model cannot be identified 
within the restricted class of GERGMs. To incorporate the aforementioned statistics and 
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extend the class of available GERGMs, we develop a general inferential framework via 
Metropolis-Hastings that is applicable to any GERGM specification. 


3.4 A General Inferential Framework via Metropolis-Hastings 

An alternative and more flexible way to sample a collection of networks from the density 
fx(x,9) is the Metropolis-Hastings procedure. The Metropolis-Hastings procedure that we 
propose samples the t + 1st network, x^ t+l \ via a truncated multivariate Gaussian proposal 
distribution q(-\x^) whose mean depends on the previous sample x® and whose variance 
is a fixed constant cr 2 . 

The truncated Gaussian is a convenient and commonly used proposal distribution for 
bounded random variables such as those on the [0,1] interval with which we are working 
(see, e.g., Browne (2006); Claeskens et al. (2010); Muller (2010); Neelon et al. (2014); 
Franks et al. (2014)). The advantage of the truncated Gaussian over the obvious alternative 
for bounded random variables - the Beta distribution - is that it is straightforward to 
concentrate the density of the truncated Gaussian around any point within the bounded 
range. For example, a truncated Gaussian with /t = 0.75 and cr = 0.05 will result in proposals 
that are nearly symmetric around 0.75 and stay within 0.6 and 0.9. In practice, we found the 
shape of the Beta distribution to be less amenable to precise concentration around points 
within the unit interval, which leads to problematic acceptance rates in the Metropolis- 
Hastings algorithm. 

We say that re is a sample from a truncated normal distribution on [a, b] with mean p 
and variance a 2 (written W ~ TN( a ,b)(p, cr 2 )) if the pdf of W is given by: 


g w (w\ix,a 2 ,a,b) 




a ^ w ^ b 


where </>(■(p, a 2 ) is the pdf of a iV(p, a 2 ) random variable and $(■) is the cdf of the standard 
normal random variable. To ease notation, we write the truncated normal density on the 
unit interval as 


q a {w\x) = g w (w\x, a 2 ,0,1) 


(13) 
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This will be our proposal density. Denote the weight between node i and j for sample t by 
xfj. The Metropolis-Hastings procedure we employ generates sample x ( t+1 ) sequentially 
according to an acceptance/rejection algorithm. The t + 1st sample a;( t+1) is generated as 
follows. 

1. For i,j e [n], generate proposal edge x-* ~ q a {w\xi^) independently across edges. 

2. Set 


where 


x (t+1) = < 


x(t) = ( x ij)i,Mn\ W -P‘ p(x {t \x^) 

X® w.p. 1 — p(x^\x^) 


p(x, y) = min (fl 1 


l fx( x \0) J E f n] qAVijM’ 


min [ exp (tf'(h (y) - h(x))) Y[ 

i,je[n 


Qcr(. x ij | Vij) 
Quijjij | x ij) 


(14) 


The acceptance probability p(x ( - t \x^) can be thought of as a likelihood ratio of the 
proposed network given the current network and the current network given the proposal 
x^\ Large values of p{x^\x^>) suggest a higher likelihood of the proposal network. It is 
readily verified that the resulting samples {x^\t = 1,..., M } form a Markov Chain whose 
stationary distribution is the target pdf f x (- 10). 

The proposal variance parameter a 2 influences the average acceptance rate of the 
Metropolis-Hastings procedure described above. Indeed, the value of a 2 tends to be in¬ 
versely related to the average acceptance rate of the algorithm. Roberts et al. (1997) 
analyzed the efficiency of general random walk Metropolis algorithms and found that an ac¬ 
ceptance rate of 0.234 optimized the convergence rate of this class of algorithms. Following 
their heuristic, we suggest tuning a 2 so that the average acceptance rate is approximately 
0.25. 1 

1 We introduce this criterion as a heuristic for MH sampling for GERGM, since the conditions outlined by 

Roberts et al. (1997) for 0.234 to be optimal do not apply to sampling from GERGM. 
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The Metropolis-Hastings algorithm requires specification of an initial sample To 
this end, we sample from a collection of independent uniform random variables on 
the unit interval. We set a sufficient burn-in so that the resulting chain of M samples have 
converged. To test the convergence of the samples, we use the Geweke dignostic test for 
stationarity (Geweke, 1991) on the network statistics associated with the collection of 
samples. Furthermore, traceplots of the network statistics can be used to readily surveil 
the convergence of the network samples. We illustrate how to diagnose convergence in the 
numerical study in the Appendix. 

4 Flexible Model Specification 

In the context of the dichotomous ERGM, a substantial literature has arisen around how to 
best formulate network statistics that represent important generative relational processes 
such as transitivity, balance, and preferential attachment (Wasserman and Pattison, 1996; 
Park and Newman, 2004b; Snijders et al., 2006; Hunter et al., 2008). The initial devel¬ 
opment of ERGM specifications focused on local subgraph counts, such as the number of 
two-stars and triangles, that implied straightforward conditional distributions for each tie 
given the rest of the network (i.e., Markov graphs (Frank and Strauss, 1986)). Intermediate 
extensions of the standard suite of network statistics used in ERGM specifications focused 
on more advanced or higher-order subgraph counts (Pattison and Robins, 2002), reflecting 
longer paths and clique-like structures among node sets. 

Unfortunately, in most cases, these motif-count specifications lead to empirically im¬ 
plausible models due to the problem of degeneracy. Snijders et al. (2006) and Hunter 
et al. (2008) propose the use of geometrically decreasing weights in the calculation of 
statistics for transitivity, and for in- and out-degree distributions. The down-weighting in 
these statistics takes effect as a single node or edge is involved in many subgraph motifs 
(e.g., the contribution to the transitivity statistic from the first shared partner to two nodes 
incident to an edge, is more than four times the contribution of the fourth shared partner). 
These geometrically weighted specifications were shown to avoid degeneracy with much 
greater success than models specified with simple local subgraph counts. The geometrically 
weighted shared partners (GWESP) statistics from Snijders et al. (2006) and Hunter et al. 
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(2008) reduces the weight of high order statistics in an ERGM and reduces the compu¬ 
tational complexity of typical subgraph counting. Wyatt et al. (2010) suggest using the 
geometric mean of subgraphs as the measure of “subgraph intensity” for network statistics. 

In the GERGM framework, we specify statistics that correspond to the subgraph config¬ 
urations that have proven fruitful in specifying binary-valued ERGMs. Though virtually any 
network statistic can be used in a GERGM specification, we focus on a flexible, two-pronged, 
weighting scheme that dampens the extremes that arise through summed subgraph prod¬ 
ucts. The geometric mean suggested in Wyatt et al. (2010) can be seen as dampening the 
change in subgraph sums with respect to subgraph product values by exponentiating the 
subgraph product to an exponent between 0 and 1. The first prong in our weighted spec¬ 
ifications can be considered a generalization of the geometric mean. That is, we suggest 
exponentiating each sub-graph by exponent (a e (0,1]) before summing over all subgraphs. 
We refer to this as a-inside weighting. The second prong in our specifications represents an 
extension of the triangle model specification in Lubetzky and Zhao (2015). Lubetzky and 
Zhao (2015) show that raising the triangle density to an exponent greater than zero, but 
less than 2/3 leads to an ERGM specification that is asymptotically distinguishable from 
Erdos-Renyi random graphs, which is not true of the conventionally-specified (i.e., non- 
exponentiated) ERGM statistics. We refer to the latter prong as the a-outside specification. 

Aside from providing different empirical fit, the a-outside model leads to a more com¬ 
plicated pattern of dependence among the ties, with all ties dependent upon each other, 
to a degree. The o-inside weighting leads to the local dependence common to ERGMs, in 
which the change statistics (i.e., derivatives of h with respect to edge values in the GERGM) 
depend upon edges in which an edge is embedded in subgraphs relevant to the statistics. 
Formally, as long as the statistics being raised to a are sub-graph products, the a-inside 
weighting leads to a Markov graph (Frank and Strauss, 1986) form of the GERGM, in 
which the joint density of the constrained (i.e., quantile) graph factorizes to a product over 
functions of sub-graphs. Frank and Strauss (1986), drawing on the Hammersley-Clifford 
theorem, discuss how ERGM specifications that factorize by sub-graphs exhibit local depen¬ 
dence in which edges depend only on neighbors within the subgraphs. Since it does not 
factorize by sub-graphs, the a-outside specification leads to global dependence, in which 
the change statistics depend upon the local edges as well as the global network statistic val- 
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ues. This is readily observed by considering the derivative of a statistic weighted according 
to the o-outside specification with respect to a change in an edge X %3 . Let h a {X) = h(X) a , 
then 

dh a a dh 

dX~ 3 = hX^dX^' C ^ 

We see here that the change statistic with respect to an edge increases with the values 
of the edges that are local to the edge in a given network statistic (i.e., but decreases 
with the global value of the network statistic (i.e., /r 1_ "). The decrease with the global value 
of the statistic is a dampening effect according to which the tendency to form dense motifs 
lessens with the average/total density of those motifs across the network. We consider 
these two approaches to dampening the combinatorial growth in network statistic values, 
and show that each method can be used to avoid degeneracy in GERGM. We note that in 
principle one can specify any suite of network statistics for a GERGM specification. In this 
work, we specifically consider o-outside specification using the statistics described in Table 
1. In Section 5, we show that our chosen flexible network statistics provide a means to 
avoid degeneracy in the GERGM and capture relevant network motifs in application. 

5 Applications 

We assess the performance and utility of our proposed Metroplis-Hastings procedure for 
the GERGM using real and simulated networks. First, we analyze an application in which 
the Metropolis-Hastings sampler can be used to fit non-degenerate model specifications in 
a situation where the Gibbs sampler is not available. For this, in Section 5.1 we analyze an 
international lending network that contains the aggregate bank lending volume between 
17 large industrialized nations in 2005. In Section 5.2 we analyze the U.S. state migration 
network from 2006 to 2007. In this example, we validate our Metropolis-Hastings proce¬ 
dure by numerically comparing its estimates with those obtained from the Gibbs approach 
in Desmarais and Cranmer (2012). In Section 5.3 we explore the utility of flexible model 
specification for a directed variant of the two-star model (Handcock et ah, 2003). In binary 
networks, the two-star model is known to be prone to dengeneracy given small changes 
in its parameter values (Park and Newman, 2004c). Our simulation study suggests that 
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Network Statistic Parameter 


Value 


Reciprocity 

Or 

2 x v x 3i 



\i<j J 



Transitive Triads 


$tt S (XijXjkXik T XijXkjXki “f ^‘ijXkjXik) ~f 


i<j<k 


QJT 


(XjiXjkXki T XjiXjkXik ~f XjiXkjXki) 

i<j<k 


Table 1: Summary of network statistics used in the specification of a GERGM in this work. These 
are the a-outside specification of five commonly-used network statistics. 


one can easily identify non-degenerate GERGM specifications for a weighted version of the 
two-star model. Importantly, we show that under certain weightings, Metropolis-Hastings 
can simulate networks with any desired edge density and clustering structure. The R code 
and all of the data used in this section are available in the online supplement. 


5.1 International Lending Network 

Our first application of the GERGM is to the network of aggregate private and public lending 
between 17 large industrialized nations in 2005. Weighted directed edges between nations 
represent the total monetary volume, in millions of U.S. dollars, that was loaned from one 
nation to another. Figure 1 illustrates this weighted network. This data was collected by the 
Bank for International Settlements (BIS) and a descriptive analysis was originally published 
in Oatley et al. (2013). To the best of our knowledge, there have been no published studies 
of international lending using statistical network models. There are numerous theoretical, 
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exploratory, and descriptive analyses on international lending as a network phenomenon 
(Niemira and Saaty, 2004; Nier et al., 2007; Rodriguez, 2007; Gai and Kapadia, 2010; 
Amini et al., 2013; Billio et al., 2012), especially in the wake of the 2008 financial crisis. 
One particular challenge in this network is the heavy tailed nature of the lending volumes 
(with the majority of lending concentrated between Germany, Great Brittan, Japan, and 
the United States). We first apply an ln(x+l) transformation on all aggregate lending 
flows between countries - a standard practice in international finance applications - and 
subsequently model the transformed edge weights using the GERGM. 



Figure 1: Network plot of the international aggregate interbank lending network. Darker edges 
indicate a larger volume of lending. 

We control for several important exogenous predictors in the GERGM specification. In 
particular, we include sender and receiver effects for the (natural log) gross domestic prod¬ 
uct (GDP) as we expect countries with larger economies to both lend and borrow more 
than those with smaller economies. We also include network predictors that represent (nor¬ 
malized) aggregate trade volume between countries, as well as the (normalized) number 
of inter-governmental organization (IGO) co-memberships. We expect that countries that 
trade more with one another will also lend more with one another, and that those countries 
that share a larger number of co-memberships in IGOs will also be more likely to lend 
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more frequently with one another due to their increased diplomatic cooperation. Finally, 
we include mixing matrices parameterizing the propensity for countries to lend to each 
other based on G8 membership 2 . We chose T as the cdf of a Student’s t distribution with 
one degree of freedom, whose median is a linear regression on the specified exogenous 
predictors. 

In addition to the exogenous predictors discussed above, we also include structural 
network predictors, including mutual dyads, transitive triads, and out two-stars statistics 
in our model. These statistics allow us to test for the presence of mutuality, clustering, and 
economies of scale in lending, all of which are theoretically important in the international 
trade (Oatley et al., 2013). Although this specification includes a number of exogenous 
covariates for control, we find that GERGM model with no a down weighting exhibited 
degeneracy. To address this degeneracy, we considered an exponentially weighted model 
using statistics from Table 1. We used the Metropolis-Hastings procedure to estimate the 
GERGM where network predictors were down-weighted by ct R = a TT = o 0 ts = 0.8. The 
0.8 value was selected because it was the lagest value for which we could consistently 
estimate a non-degenerate model across multiple runs of estimation. We optimized the 
Metropolis-Hastings proposal variance at each step in the estimation process (with a target 
acceptance rate of 0.25 + 0.05) initialized a burn-in of 400,000 full network samples, and 
then sampled 800,000 networks from which we thinned the resulting sample by keeping 
every two hundredth network. The average acceptance rate was approximately 0.22. The 
resulting parameter estimates are given in Table 2. To assess convergence of the estimated 
models, we simulated 800,000 networks and compared the distribution of the mutual dyads, 
transitive triads, and out two-stars statistics to the observed values in the lending network. 
Further, we investigated the goodness of fit of our model by comparing the distributions of 
the simulated and observed in two-stars, cyclic triads, and network density distributions. 
These results are shown in Figure 2. 

As we can see from Figure 2, our model has appeared to have converged based on the 
distributions of the transitive triads, mutual dyads, and out two-stars statistics. In terms of 
goodness of fit, we see that our model provides a very good fit for the observed network in 

2 The G8 member countries are Canada, France, Germany, Italy, Japan, Russia, the United Kingdom and 
the United States. 
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Parameter 

Statistic Estimate (s.e.) 


Transitive Triads 

1.387 

(0.478) 

Out Two Stars 

-2.645 

(0.811) 

Mutual Dyads 

7.023 

(2.900) 

G8 Sender, G8 Receiver 

1.505 

(0.186) 

G8 Sender, Non-G8 Receiver 

0.804 

(0.177) 

Non-G8 Sender, G8 Receiver 

1.480 

(0.147) 

IGO Co-members 

1.390 

(0.210) 

log(GDP) Sender 

0.606 

(0.126) 

log(GDP) Receiver 

4.887 

(1.040) 

Normalized Net Exports 

-0.068 

(0.046) 


Table 2: Estimates of the network parameters of the GERGM model when fit to the international 
lending network via the Metropolis-Hastings procedure. 


terms of cyclic triads; however, the in-two-stars and network density values are somewhat 
overestimated. Geweke statistics and trace plots of the network density for 800,000 net¬ 
works simulated from the fitted GERGM specification via Metropolis-Hastings simulation 
also indicate that the model has converged (see Figure 6 in the Appendix). The exogenous 
covariate parameter estimates from our model largely conform to our theoretical expecta¬ 
tion, although it is interesting that we see a small (negative) parameter estimate for the 
effect of trade, indicating that there is not a particularly strong relationship between trade 
and lending in our network, when controlling for other economic and network factors. We 
observe positive and statistically significant transitivity and reciprocity effects, and a nega¬ 
tive out-two stars effect. These results are interesting because previous studies, including 
Oatley et al. (2013), have argued that the international lending network is hierearchical - 
a property that does not match our results as we would expect to see a positive out two- 
stars parameter estimate. Further exploration of this finding is outside of the scope of this 
discussion, but should be considered in future research. 

To further asses the potential for degeneracy in our model, we performed a hysteresis 
analysis similar to that described in Snijders et al. (2006) for each structural parameter 
estimate. Starting with a sparse network and holding all other parameter estimates at their 
posterior means, we varied each structural parameter estimate ten standard deviations 
below to ten standard deviations above its posterior mean and simulated 500,000 networks 
using our Metropolis-Hastings procedure from each parameter value combination (with 
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Figure 2: Convergence and goodness of fit plots for the model fitted to the international lending 
network. For each fit model, 800,000 networks were simulated using the Metropolis-Hastings 
sampling procedure and the final exogenous and structural parameter estimates. Each box plot 
compares the quantiles of simulated networks (with mean statistic values at the red line) to the 
observed network statistic (blue line). Here, the In 2-Stars and Network Density statistics were not 
included in the fitted model. 


a burnin of 500,000). We changed the structural parameter value 0.5 standard deviations 
at each iteration of this process, for a total of 41 parameter values. For each new value 
of the parameter, we used the final network from the M-H simulation using the previous 
parameter value as the initialization for the M-H simulation for the new specified parameter. 
We plotted the mean network density against the parameter values in order to asses the 
potential for jumps in the network density that might indicate an underlying issue with 
model degeneracy. Figure 3 shows the hysteresis plots for our model, and these plots do 
not indicate any obvious issues with degeneracy in this specification. 


5.2 U.S. Migration Network 

We next apply the GERGM to the U.S. migration network analyzed in Desmarais and Cran- 
mer (2012). We note that this application is used for validation of our Metropolis-Hastings 
procedure; indeed, we compare the estimates obtained with Metropolis-Hastings directly 
with the estimates obtained from the Gibbs sampler for the same GERGM specification. 

Historically, interstate migration has played an important role in the understanding of 
local financial markets, public infrastructure, and the political climate within each state 
(Clark and Ballard, 1981; Levine and Zimmerman, 1999; Gimpel and Schuknecht, 2001). 
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Figure 3: Hysteresis plots for the international trade network. Shaded regions cover +1.96 standard 
deviations in the simulated network densities for that parameter value. The vertical black line 
indicates the parameter value from the main estimation step and the horizontal black line indicates 
the observed network density. 

The network that we model contains 51 nodes that represent the 50 U.S. states as well as 
Washington, D.C. Directed edges are placed between states in which there was a change in 
interstate migration flow from 2006 and 2007. The weight, y l]; associated with the directed 
edge from node i to node j is the total change in interstate migration from state i to state j 
between 2006 to 2007. This data set also contains ten demographic exogenous predictors 
that further describes the pairwise relationships between states. The predictors describe 
the geographic distance, and the sender and receiver effects of high January temperature, 
income, unemployment, and population of the states. Like the application in Section 5.1, 
we chose T as the cdf of a Student t distribution with one degree of freedom, whose median 
is a linear regression on the specified demographic predictors. 

We incorporated network statistics that represent reciprocity, cyclic triads, in-two-stars, 
out-two-stars, and transitive triads in our GERGM specification, and following the model 
fit in Desmarais and Cranmer (2012) we used no a down-weighting. 

We fit the above model using both the Metropolis-Hastings sampling procedure and 
Gibbs. For Gibbs, we use 50,000 simulated networks with a set burn-in of 10,000 networks 
in each iteration. We optimized the Metropolis-Hastings proposal variance at each step 
in the estimation process (with a target acceptance rate of 0.25) initialized a burn-in 
of 1,000,000 full network samples, and then sampled 2,000,000 networks from which 
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we thinned the resulting sample by keeping every one thousandth network. The average 
acceptance rate was approximately 0.24. The parameter estimates and associated standard 
errors for each method are shown on in Table 3. 


Statistic 

Transitive Triads 
Cyclic Triads 
Out Two Stars 
In Two Stars 
Mutual Dyads 
Unemployment Sender 
Unemployment Receiver 
Mean January Temp. Sender 
Mean January Temp. Receiver 
Population Size Sender 
Population Size Receiver 
Mean Income Sender 
Mean Income Receiver 
Distance 
Dispersion 


M-H Parameter 
Estimate (s.e.) 
0.074 (0.053) 
-0.206 (0.042) 
0.017 (0.044) 
-0.029 (0.039) 
-0.131 (0.348) 
27.163 (13.481) 
-3.673 (12.382) 
-11.031 (14.474) 
-15.147 (13.609) 
1.806 (20.264) 
-35.532 (16.127) 
2.349 (11.613) 
-1.129 (10.652) 
7.081 (11.917) 
5.942 (0.029) 


Gibbs Parameter 
Estimate (s.e.) 
0.078 (0.053) 
-0.204 (0.040) 
0.011 (0.043) 
-0.030 (0.040) 
-0.107 (0.338) 
27.402 (13.463) 
-3.475 (12.476) 
-11.167 (14.452) 
-15.101 (13.713) 
1.744 (20.244) 
-35.282 (16.215) 
2.220 (11.583) 
-0.969 (10.735) 
7.218 (11.970) 
5.942 (0.029) 


Table 3: Estimates of the network parameters of the GERGM model when fit to the U.S. migration 
network via the Metropolis-Hastings and Gibbs sampler procedures. 


Table 3 reveals that the Metropolis-Hastings and Gibbs procedures provide comparable 
estimates for each of the modeled predictors. This suggests that each method simulates 
from the same distribution, as expected. Furthermore, these fitted GERGM reveals three 
interesting, perhaps expected, trends in the data: (i) there was increased migration away 
from states with high unemployment, (ii) there was decreased migration to states with a 
large population, and (iii) there was decreased migration to states with high unemploy¬ 
ment. See Desmarais and Cranmer (2012) for a more detailed discussion of these results. 
We provide further estimation diagnostics for the Metropolis-Hastings procedure in the 
Appendix. 


5.3 Non-Dengenerate Specifications of the Two-Star Model 

In our simulation study, we consider fitting a GERGM to a directed and weighted variant of 
the two-star model. Consider an edge configuration x e [0, l] m . We model the occurrence 
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of x as a function of its edges and in-two-stars: 


fx(x,0,a) 


exp ( 9 E h E (x ) + 9 1TS h ITS (x, a)) 
C(9 E , #its) 


X 6 [0, l] m 


(16) 


h E (x) = YjXij/m 
h ITS (x ,«) = I 2 x i iXki ) ’ 

\ i j<k=£i / 

where C{0 El 0 ITS ) is the normalizing constant that ensures f x {x,9) integrates to one, h E {pc) 
is the edge density of x, and h IT s(x , 1) is the in-two-star density of x. When a = 1, model 
(16) is the directed and weighted version of the two-star model considered in Handcock 
et al. (2003). Model (16) is closely related to the triangle model from Jonasson (1999); 
Haggstrom and Jonasson (1999) and the widely used Ising model for lattice processes. We 
will refer to model (16) as the weighted in-two-stars model. 

The unweighted two-star model is a well-known example that suffers from likelihood 
degeneracy (see Handcock et al. (2003) or Snijders et al. (2006) for instance). In this 
simulation study, we empirically analyze model (16) following a similar study as that 
described in Snijders et al. (2006). We find that, surprisingly, the weighted in-two-stars 
model does not demonstrate the typical signs of degeneracy. We now describe the simulation 
study and our findings. 

We first fix the edge density parameter d E at -2 and a value of a between 0 and 1. We 
then simulate one million size 10 networks following model (16) for each integer value of 
6*i T s between -10 and 10 using the MH sampler. We calculate the mean edge density and 
the mean in-two-stars value from the million samples at each value of 0 ITS . We repeat this 
procedure for a values of 0.10, 0.25, 0.50, 0.75, 0.90, and 1. The results are reported in 
Figure 4. 

We see from Figure 4 that for a = 1, there is a large jump in the value of the in-two- 
stars and edge density statistics between ^its = 0 and ^n-s = 1. As a decreases, the relative 
magnitude of this jump decreases and the statistics’ curves are relatively smooth over 
changes in 6* ITS . When a is too small (s$ 0.50), the in-two-stars statistic value approaches 
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In-Two-Stars 


Edge Density 




Figure 4: The mean value of the in-two-stars statistic and the edge density of one million simulated 
networks of the in-two-stars model with 9e = —2 and integer values of 9ns from -10 to 10. The 
mean values are shown for a values of 0.10, 0.25, 0.50, 0.75, 0.90, and 1. 

one and the edge density only changes slightly across values of 0 ITS . The jump phenomenon 
was also witnessed for binary networks in the two-stars model in Snijders et al. (2006), 
where it was observed that this model specification was most prone to degeneracy issues. 
As a consequence, one may expect the empirical distribution of the in-two-stars and edge- 
density statistics in the neighborhood of 0 ITS = 0 and # ITS = 1 to be bimodal at large values 
of a. 

To investigate whether this is the case, we performed a more fine grained grid search 
for the value of # ITS at which the edge density of the network changed the most, for each 
value of a. We found that, for example, when a = 0.5, the steepest change in the edge 
density occurred at approximately 0 IT s = 0.55. Similarly, for a = 0.75 and a = 1, the 
steepest changes in the edge density occurred at approximately 0 ITS = 0.65 and 0 ITS = 0.75, 
respectively. We show the empirical distribution for both of the statistics for a values of 
0.50, 0.75, and 1.00 at the values of ^its given above, in Figure 5. Figure 5 suggests that 
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these distributions are not bimodal for these values of a, including a = 1. We also evaluated 
these distributions for all other values of 0i TS in our simulations and found similar results. 
Furthermore, these results are not sensitive to the value of d E , as it serves only to shift 
the curves depicted in Figure 4 left or right. These findings suggest that the weighted 
in-two-stars does not suffer from the same degeneracy issues as its binary counterpart. 


a = 0.50 


a = 0.75 


a = i.oo 



Simulated In-Two-Stars Value 




Simulated Network Density 




Simulated Network Density 


Simulated Network Density 


Figure 5: The empirical (scaled) frequency distribution of the edge density and the in-two-stars 
value for the in-two-stars model at (frrs = {0.55,0.65,0.75} (from left to right). One million networks 
were simulated and the values for each statistic is shown over all networks. 


In summary, this simulation study provides insights into two important features of the 
GERGM specification of the two-stars model. First, the weighted in-two-stars models does 
not appear to suffer from degeneracy at any value of a. This surprising result is contrary to 
the well-known unweighted two-stars model. This simulation also gives some intuition as 
to how to choose the tuning parameter a. Small values of a (^ 0.50) dampened the effect 
of the in-two-stars statistic too drastically and are therefore not suggested. We encourage 
using values of a between 0.5 and 0.9 as these values lead to decreased sensitivity of the 
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GERGM model to parameter changes. 


6 Discussion 

We have proposed, explicated, and demonstrated several advances in the statistical mod¬ 
eling of weighted networks by substantially increasing the utility of the GERGM. These 
extensions to the GERGM, taken together, represent a significant increase in the model’s 
capabilities, such that it is now possible to use nearly any model specification for inference 
on continuous-valued weighted graphs. 

First, we have proposed and implemented a Metropolis-Hastings algorithm for fitting 
GERGMs. In the original development of the GERGM, Desmarais and Cranmer (2012) 
proposed a Gibbs sampling strategy to estimate the model. Elowever, this approach is 
limited by the fact that fairly strict constraints are placed on the set of network statistics 
that can be used in the model. Our Metropolis-Hastings procedure relaxes these restrictions 
and allows one to use the full set of possible specifications for the model. 

Second, we have proposed an approach to dampening the extreme values often pro¬ 
duced by subgraph sums and thus avoiding model degeneracy. This dampening technique, 
because it is critical in avoiding degenerate model specifications in the GERGM, allows ana¬ 
lysts to specify a practical and diverse set of endogenous effects as part of the network data 
generating process. Though this may seem a simple extension to the means by which statis¬ 
tics are computed on the network, this weighting strategy is important because degeneracy 
is a major obstacle to estimation of inferential models on real-world networks. 

We consider two approaches to network statistic dampening—one in which subgraph- 
specific sums are raised to a fractional exponent (i.e., a-inside dampening), and a stronger 
approach that involves raising the sum over all subgraphs to a fractional exponent (i.e., 
a-outside dampening). It is important to re-iterate that, while the a-inside approach con¬ 
forms to the local dependence that is typical to ERGM formulations, the a-outside approach 
induces global dependence in that each tie variable depends to some degree on the value of 
every other tie variable in the network. We see from Equation 15 that the a-outside formu¬ 
lation exhibits a form of dependence similar to the a-inside formulation in that high edge 
values are more likely if they contribute to local configurations that are themselves high 
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and associated with positive parameter values. However, the o-outside formulation exhibits 
an additional form of dependence in that the likelihood of high edge values embedded in 
high value configurations decreases as the global sum over the respective configuration 
type increases. This global dampening in the o-outside model is inversely related to the 
value of a. Though the a-inside and ct-outside formulations may both be considered in 
efforts to avoid degeneracy, we note that researchers should also consider how the choice 
between these two formulations affects the interpretation of results. 

Though we have presented important innovations here, much work remains. Specifically, 
we have just scratched the surface when it comes to model selection and specification for 
GERGMs. First, both in Desmarais and Cranmer (2012) and in the current study, the statis¬ 
tics used to specify the GERGM have represented straightforward functional adaptations 
of the statistics commonly used for binary ERGMs. Future research should consider suites 
of statistics that are applicable to the special case of weighted networks. Furthermore, our 
approach to weighting the subgraph products requires a choice of a that will rarely be theo¬ 
retically informed. In our simulation study, we analyzed the effects of a on the sensitivity of 
the two-stars model and found encouraging results for a e (0.5,1]. In principle, one could 
use an alternative data-driven approach that chooses a based on goodness of fit summaries. 
We plan to investigate this more fully in future work. Finally, the results in the simulation 
study gave empirical evidence in one well-studied model that the GERGM does not suffer 
from degeneracy like its binary ERGM counterpart. We aim to theoretically formalize these 
findings in future work. 
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Appendix 

A: Pseudo-code for MCMC Maximum Likelihood Estimation of the GERGM 


Algorithm 1: GERGM MCMCMLE 
Data: Data and parameters 
y <— vector of edge weights 

Toll <— tolerance level for GERGM estimation convergence 

Tol2 < - tolerance level for Metropolis-Hastings algorithm convergence 

N <— number of network samples generated for each iteration of Metropolis-Hastings sampler 

h <- vector of functions to calculate network statistics 

T <— transformation function 

c <— shape parameter # Default = 1 

t <— 0 # iteration number 

A! <- 1000 

A 2 <- 1000 


while Ai > Toll do 

t + +; # Increment iteration 

if t = 1 then 

| Estimate /3 t via MPLE 
else 

Estimate /3 t via Gradient Descent 
Estimate Theta via Algorithm 2 to obtain 9 t 

end 


if lA-i 111||^-i III > Othen 


At =3 


1A - A-illl , IIA - A-iP 


IIA- 


+ 


1112 


IIA- 


r—1II2 


se 


end 


Ai — \ [|A — A— 1 IIi + IIA — A-illl] 


end 

return (J3 t , A) 
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Algorithm 2: Estimate Theta 

# Initialize parameters 

x = T(y, f3 t ) # current estimate of edge weights based on estimate of /3 t 
d 0 = 0 t -1 # initialization of coefficient vector 
r = 0 # iteration number 
m = length(y) # Number of edges 


while A 2 > Tol2 do 

r + + # Increment iteration 

Simulate a sample xq of length m from density: 

f(x\9 r _i) = ^P&r-iHx)] , c(d r _ i) = f exp [9' r _ 1 h{z)]dz 
6 (d 0 ) J[o,i] M 


Run Metropolis Hastings Update via Algorithm 3 to obtain samples x\,... ,xn- 
# Update 9 r 

9 r = argmax e — log[C(d)] via Gradient Descent, where 

N 


C{ 0 ) = — 2 ex P [( 0 “ Or-l)'h(Xj)] 


3 = 1 


# Calculate distance from previous step 
if ||d r _i||| > 0 then 

9 r — 9 r —i II 


A 2 = 


se 


end 


II #r-til 

9 r — 1||2 


end 


return 9 r 
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Algorithm3: Metropolis Hastings Update 

Data: Data and Parameters 

a 2 : variance of Truncated Normal density # Default = 1 

x° Initial weighted network sample (all edges have values between 0 and 1. 

n : total number of nodes in the network 


# loop over number of MH network samples 
for k = 0; k < N; k+ + do 

# Draw a single proposal sample 
for i = 0; i < n; i+ + do 
for j = 0; j < n; j+ + do 

Sample a proposed edge weight Wij from a truncated normal distribution centered at the 
previous iteration’s edge weight 


TN (. 


4“V 2 ) 


Calculate the probability of the proposed edge weight w,,- under the truncated normal 
distribution centered around the current edge weight (xv -1 ). 

p(w)ij = P (w < Wij | x.f _ 1 ,cr 2 ) 


Calculate the probability of the current edge weight (a:*- 1 ) under a truncated normal 
distribution centered around the proposed edge weight (w,,). 

p(x)ij = P (w < 4" 1 | w^, a 2 ) 

end 

end 

Set w as the proposed network and Xk-i as the previous iteration’s network. 

Calculate the following log joint pdfs: 


p x = J] 

i 3 
k k 

p w = ££iog(p(t«,) 0 ) 

* 3 

Calculate a = (P x - P w ) + 9' [h(wij) — h(x^ -1 )] 

Sample u from a Uniform(0,l) density 

if loq(u) ^ a then 
I x k = w 

else 

| %k = %k— 1 

end 

end 


return xi,...,xjy 


B: GERGM Fit Diagnostics 

To evaluate convergence of the Metropolis-Hastings procedure on the international lending 
network and the U.S. Migration data, we evaluate the trace plot for the network density 
of the simulated networks over 800,000 simulated networks. We show this plot in Figure 
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6. For the U.S. Migration data, we simulated 100,000 networks with 10,000 burn-in. The 
modeled statistics, as well as the MCMC traceplot for the network density for this data 
are shown in Figure 7. Visual inspection as well as the Geweke convergence test statistic 
suggest that the sampler has converged. 


Trace Plot of Density for Simulated Networks 
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Figure 6: International lending network trace plot for the network density of the simulated networks 
over 800,000 simulations from Metropolis-Hastings. 
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Figure 7: U.S. migration network trace plot for the network density of the simulated networks from 
100,000 simulations from Metropolis-Hastings. 
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